Generation that Exploits Corpus-Based Statistical Knowledge
نویسندگان
چکیده
We describe novel aspects of a new natural language generator called Nitrogen. This generator has a highly flexible input representation that allows a spectrum of input from syntactic to semantic depth, and shifts the burden of many linguistic decisions to the statistical post-processor. The generation algorithm is compositional, making it efficient, yet it also handles non-compositional aspects of language. Nitrogen's design makes it robust and scalable, operating with lexicons and knowledge bases of one hundred thousand entities. 1 I n t r o d u c t i o n Language generation is an important subtask of applications like machine translation, humancomputer dialogue, explanation, and summarization. The recurring need for generation suggests the usefulness of a general-purpose, domain-independent natural language generator (NLG). However, "plugin" generators available today, such as FUF/SURGE (Elhadad and Robin, 1998), MUMBLE (Meteer et ah, 1987), KPML (Bateman, 1996), and CoGenTex's RealPro (Lavoie and Rambow, 1997), require inputs with a daunting amount of linguistic detail. As a result, many client applications resort instead to simpler template-based methods. An important advantage of templates is that they sidestep linguistic decision-making, and avoid the need for large complex knowledge resources and processing. For example, the following structure could be a typical result from a database query on the type of food a venue serves: (( :obj-type venue)(:obj-name Top_of_the_Mark) ( :a t t r ibu te food-type)( :at t r ib-value American)) By using a template like 's is . the structure could produce the sentence, "Top of the Mark's food type is American." Templates avoid the need for detailed linguistic information about lexical items, part-of-speech tags, number, gender, definiteness, tense, sentence organization, sub-categorization structure, semantic relations, etc., that more general NLG methods need to have specified in the input (or supply defaults for). Such information is usually not readily inferrable from an application's database, nor is it always readily available from other sources, with the breadth of coverage or level of detail that is needed. Thus, using a general-purpose generator can be formidable (Reiter, 1995). However, templates only work in very controlled or limited situations. They cannot provide the expressiveness, flexibility or scalability that many real domains need. A desirable solution is a generator that abstracts away from templates enough to provide the needed flexibility and scalability, and yet still requires only minimal semantic input (and maintains reasonable efficiency). This generator would take on the responsibility of finding an appropriate linguistic realization for an underspecified semantic input. This solution is especially important in the context of machine translation, where the surface syntactic organization of the source text is usually different from that of the target language, and the deep semantics are often difficult to obtain or represent completely as well. In Japanese to English translation, for example, it is often hard to determine from a Japanese text the number or gender of a noun phrase, the English equivalent of a verb tense, or the deep semantic meaning of sentential arguments. There are many other obvious syntactic divergences as well. Thus, shifting such linguistic decisions to the generator is significantly helpful for client applications. However, at the same time, it imposes enormous needs for knowledge on the generator program. Traditional large-scale NLG already requires immense amounts of knowledge, as does any large-scale AI enterprise. NLG operating on a scale of 200,000 entities (concepts, relations, and words) requires large and sophisticated lexicons, grammars, ontologies, collocation lists, and morphological tables. Acquiring and applying accurate, detailed knowledge of this breadth poses difficult problems. (Knight and Hatzivassiloglou, 1995) suggested
منابع مشابه
Selecting Domain-Specific Concepts for Question Generation With Lightly-Supervised Methods
In this paper we propose content selection methods for question generation (QG) which exploit domain knowledge. Traditionally, QG systems apply syntactical transformation on individual sentences to generate open domain questions. We hypothesize that a QG system informed by domain knowledge can ask more important questions. To this end, we propose two lightly-supervised methods to select salient...
متن کاملStatistical Models for Organizing Semantic Options in Knowledge Editing Interfaces
This paper describes the design and empirical evaluation of statistical models that use domain and lexical knowledge to organize new semantic options in interfaces for editing knowledge bases. We employ the models in a system that allows a domain expert to perform languageneutral knowledge editing by interacting with natural language text generated by a natural language generation system. This ...
متن کاملThe Effect of Colligational Corpus-based Instruction on Enhancing the Pragmalinguistic Knowledge of Request Speech Act among Iranian Intermediate EFL Learners
This study investigated the effectiveness of colligational corpus-based instruction on enhancing the pragmalinguistic knowledge of speech act of request among Iranian intermediate EFL learners. The objective of the study was to find out whether or not providing students with corpora through using colligational instruction had any significant effects on enhancing their pragmalinguistic knowledge...
متن کاملForest-Based Statistical Sentence Generation
This paper presents a new approach to statistical sentence generation in which Mterna-tive phrases are represented as packed sets of trees, or forests, and then ranked statistically to choose the best one. This representation offers advantages in compactness and in the ability to represent syntactic information. It also facilitates more efficient statistical ranking than a previous approach to ...
متن کاملThe Effect of Colligational Corpus-based Instruction on Enhancing the Pragmalinguistic Knowledge of Request Speech Act among Iranian Intermediate EFL Learners
This study investigated the effectiveness of colligational corpus-based instruction on enhancing the pragmalinguistic knowledge of speech act of request among Iranian intermediate EFL learners. The objective of the study was to find out whether or not providing students with corpora through using colligational instruction had any significant effects on enhancing their pragmalinguistic knowledge...
متن کاملEntrepreneurial University, a Necessity for Knowledge-Based Economy; Evaluation and Explanation of Entrepreneurial Capacity of University of Mazandaran
Entrepreneurship, is an idea and a process in which an individual or group identifies new opportunities and exploits them successfully. Universities as knowledge producing and knowledge distributing institutions play a greater role in industrial innovation. In fact the emergence of entrepreneurial university is a response to the increasing importance of knowledge in national and regional system...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1998